Subtleties in Tolerating Correlated Failures
ثبت نشده
چکیده
High availability is widely accepted as an explicit requirement for distributed storage systems. Tolerating correlated failures is a key issue in achieving high availability in today’s wide-area environments. This paper systematically revisits previously proposed techniques for addressing correlated failures. Using a combination of experimental and mathematical analysis of several real-world failure traces, we debunk four common myths about how to design systems to tolerate such failures. Based on our analysis, we identify a set of design principles that system builders can use to build services that tolerate correlated failures. We show how these lessons can be effectively used by incorporating them into ALI, a distributed read-write storage layer that provides high availability. Our results using ALI on PlanetLab over the past 8 months demonstrate its ability to withstand large correlated failures and meet preconfigured availability targets.
منابع مشابه
Subtleties in Tolerating Correlated Failures in Wide-area Storage Systems
High availability is widely accepted as an explicit requirement for distributed storage systems. Tolerating correlated failures is a key issue in achieving high availability in today’s wide-area environments. This paper systematically revisits previously proposed techniques for addressing correlated failures. Using several real-world failure traces, we qualitatively answer four important questi...
متن کاملTriple-star: a Coding Scheme with Optimal Encoding Complexity for Tolerating Triple Disk Failures in Raid
Low encoding/decoding complexity is essential for practical storage systems. This paper presents a new Maximum Distance Separable (MDS) array codes, called Triple-Star, for tolerating triple disk failures in Redundant Arrays of Inexpensive Disks (RAID) architecture. Triple-Star is an extension of the double-erasure-correcting Rotarycode and a modification of the generalized triple-erasure-corre...
متن کاملα-Register
It is well known that in an asynchronous message-passing system, one can emulate an atomic register providing that more than half of the processes are non-faulty. By contrast, when a majority of the processes may fail, simulating atomic register is not possible. This paper investigates weak variants of atomic registers that can be simulated tolerating a majority of processes failures. Specifica...
متن کاملSecurity Requirements for Tolerating Security Failures
This paper describes security failure-tolerant requirements, which tolerate the failures of security services that protect applications from security attacks. A security service, such as authentication, confidentiality or integrity security service, can be always broken down as advanced attack skills are coined. There is no security service that is forever secure. This paper describes an approa...
متن کامل